A new term-weighting scheme for text classification using the odds of positive and negative class probabilities

نویسنده

  • Youngjoong Ko
چکیده

Text classification is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term weighting schemes have to assign an appropriate weight to each term to obtain a high text classification performance. Although term weighting is one of the important modules for text classification, and text classification has different peculiarities from those in information retrieval, many term weighting schemes used in information retrieval, such as tf.idf, have also been utilized in text classification in the same manner. The peculiarity of text classification that differs most from information retrieval is the existence of class information. Therefore, this paper proposes a new term weighting scheme that utilizes class information using positive and negative class distributions. As a result, the proposed scheme, log tf.TRR, consistently performs better than other schemes using class information, as well as traditional schemes such as tf.idf.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

Analytical evaluation of term weighting schemes for text categorization

1 An analytical evaluation of six widely used term weighting techniques for text cate2 gorization is presented. The analysis depends on expressing the term weights using term 3 occurrence probabilities in positive and negative categories. The weighting behaviors of 4 the schemes considered are firstly clarified by analyzing the relation between the occur5 rence probabilities of terms which rece...

متن کامل

A Novel One Sided Feature Selection Method for Imbalanced Text Classification

The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...

متن کامل

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 66  شماره 

صفحات  -

تاریخ انتشار 2015